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Abstract 

In De Leeuw (2008) we studied the derivatives of the least squares rank p approx¬ 
imation in the case of general rectangular matrices. We modify these results for the 
symmetric positive semi-definite case, using basically the same derivation. We apply 
the formulas to compute an expression for the convergence rate of Thomson’s iterative 
principal component algorithm for factor analysis. 
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Note: This is a working paper which will be expanded/updated frequently. All suggestions for 
improvement are welcome. The directory deleeuwpdx.net/pubfolders/rank has a pdf version, 
the complete Rmd file with all code chunks, the bib file, and the R source code. 

1 Introduction 

We have a symmetric matric B that we want to approximate, in the least squares sense, 
by a symmetric positive semi-definite matrix of rank less than or equal to p. This is a 
problem that arises in various forms of multivariate analysis, specifically in factor analysis 
and multidimensinal scaling. A brief history is in De Leeuw (1974). 

Thus the problem is to minimize 


a(X) := SSQ (B — XX') 


( 1 ) 
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over X G M nxp . The stationary equations are 


BX = X(X'X). (2) 

Clearly the loss function is in variant under rotation, which means we can require without 
loss of generality that X'X is diagonal. Thus each column of the optimal X is either 
zero or an eigenvector corresponding with a positive eigenvalue of B. At stationary points 
cr(A^) = SSQ(-B) — tr X'BX. If the signature of B is (n + ,n_,no) then the optimal X 
has min(n+,p) non-zero columns equal to a/A ~ s k s , where the k s are normalized eigenvectors 
corresponding to positive eigenvalues X s , and max(0 ,p — n + ) zero columns. Although the 
optimal X is never unique, the optimal XX' is unique if and only if \ p > A p+ i. 

If B = KAK' is a complete eigen-decomposition, with eigenvalues in decreasing order along 
the diagonal, then 


T P (B) = ApA| (3) 

where the k s are the normalized eigenvectors corresponding with the p largest eigenvalues. 
To guarantee differentiability we assume throughout that \ p > A p+ i as well as A p > 0. 

2 Derivative 

2.1 Formula 

Theorem 1 gives an expression for the first order term in the Taylor expansion 

T P (B + E) = T P (B) + Vr P (B)(E) + o(||B||). (4) 

Note that T p is defined on the space of real doubly-centered symmetric matrices S nxn , with 
values in that same space. The derivative T>F p (B ) at B is a linear operator from S nxn 
to S nxn , which we apply to the real symmetric and doubly-centered perturbation E. The 
symmetry makes our results different from those of De Leeuw (2008). Define the matrix 
S := K'EK with elements ^ s t- 

Theorem 1: [Projection Derivative] 

p p n \ 

VT P (B)(E) = *5ZErfrWM'. + KK). (5) 

s=l s=l t=l 

t^s 

Proof: From De Leeuw (2007) (and many other places) we know that 
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X S (B + E) — X s + £ ss + o(||£||), 

fc s (S + £) = fc s - (B - X s I) + Ek s + o(||E||). 


Thus 


T P (B + E) = £(A S + i ss ){k s -(B- X s I) + Ek s )(k s - (B - X s I) + Ek s )' + o(||£||) = 

S=1 

= r P {B) + E {tssksK - X s k s k' s E(B - X S I) + - X S (B - X s I) + Ek s k' s } + o(||£||) 


S=1 

p 


X, 


= rp(B) + £ £„U4 - E T—VU*‘*! + *4*'.) + o(ll®ll). 


S=1 


t=l 

t^s 


QED 

Note that if B and E are SDC, then so is X j r p (S)(E). 

We generate some random data to illustrate theorem 1. 

set.seed(12345) 
eps <- .001 

b <- crossprod (matrix (rnorm(400) , 100, 4)) / 100 
e <- eps * crossprod (matrix (rnorm(400) , 100, 4)) / 100 


Then compute T p (B + E). 


## 

## [ 1 ,] 
## [ 2 ,] 
## [3,] 
## [4,] 


[, 1 ] 

1.2781679170 

0.1413226188 

0.0155429929 

0.2557498035 


[, 2 ] 

0.1413226188 

0.3754844674 

-0.3835591107 

0.4570154983 


[, 3] 

0.0155429929 

-0.3835591107 

0.4126808466 

-0.4559121477 


[ >4] 

0.2557498035 

0.4570154983 

-0.4559121477 

0.5619744567 


Then compute the zero-order approximation T p (B). 


## 

## [ 1 ,] 
## [ 2 ,] 
## [3,] 
## [4,] 


[, 1 ] 

1.2773620868 

0.1408232165 

0.0156044480 

0.2562189505 


, 2 ] 

0.1408232165 

0.3753719549 

-0.3833537174 

0.4567557210 


,3] 

0.0156044480 

-0.3833537174 

0.4122604048 

-0.4554195169 


[,4] 

0.2562189505 

0.4567557210 

-0.4554195169 

0.5616655283 


And then compute the first-order approximation T p (B) + T>F p (B)(E), using the program 
perturb () from the appendix. 

## C,13 [ , 2 ] [, 3 ] [ , 4 ] 
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## 

[1J 

1.2781680631 

0.1413211691 

0.0155430195 

0.2557508933 

## 

[2,] 

0.1413211691 

0.3754850527 

-0.3835595774 

0.4570158930 

## 

[3,] 

0.0155430195 

-0.3835595774 

0.4126805687 

-0.4559116839 

## 

[4,] 

0.2557508933 

0.4570158930 

-0.4559116839 

0.5619738470 


Clearly the first order approximates is better than the zero order one . For zero order the 
sum of the absolute values of the approximation errors is 0.0056233246, and for first order it 
is 0.0000094019. 

2.2 Structure 

To get more insight into the structure of the derivative we look at its eigenvalues and 
eigenvectors, i.e. those SDC perturbations E for which VT p (B)(E) = uE for some u. 
We investigate, again following De Leeuw (2008), perturbation matrices E of the form 
E = k a k'p + kpk' a: with the k a the eigenvectors of B. It suffices to look at those \n{n — 1) 
perturbations for which a < f3. Note they are orthogonal and consequenty are a basis for the 
space of SDC matrices. Also 


= k' s Ek t = k' s k a k'pk t + k%k' Q k t = 8 sa 8 tp + (6) 

Theorem 2: [Projection Eigen Structure] For a < /3 

if 1 < a < (8 < p, 

if l<a<p<p+l</3<n, (7) 

if P + 1 < ct < P < n. 

Proof: From (6) we have t; ss = 0 if a < (3. Thus the first term on the rhs of (5) is zero. It 
remains to evaluate 


{ kak'n + kpk' a 

Xa-Xp (kakf 3 + kpk c J 

0 


p n \ 

(8 sa 8 tp + 8 sP 8 ta )(k s k' t + k t k' s ). (8) 

.5=1 t= 1 ~ 

t^S 


If a > p then (3 > p and thus (8) is zero. If a <= p and (3 > p then 8 S/3 = 0 and 


P n \ \ 

s (S sa 8 tf} + 8 s H ta ){k s k' t + k t k' s ) = 


E£ A a 

s=l t=l A t - A 
t^s 


s 


^/3 ^OL 


(k a k'p + kpk' a ). 


If a < (3 < p then 

p n \ 

EEvEv 

s= 1 1= 1 
t^s 


/ 7 7/7 7 / \ ^/3 


(8 sa 8^+8^8 ta )(k s k' t +k t k' s ) = —{k a k' p +k p k' a )+—{k a k' p +k p k' a ) = k a k' p +k p k' c 

W — Aq, a _ a ^ 




QED 
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Corollary 1: [Projection Eigen Values] The derivative T>T P (B) has \p{p — 1) eigenvalues 
equal to +1, |(n — p)[n — p — 1) eigenvalues equal to zero, and p[n — p) eigenvalues equal to 
A x “ x , where 1 < a < /3 < n and the A Q are the ordered eigenvalues of B. 

Note that if B is positive definite then the p{n — p) eigenvalues for a < p and f3 > p are 
larger than one, if B is positive semi-definite of rank p they are equal to one. 


3 Thomson’s Factor Analysis Algorithm 

Thomson (1934) proposed an alternating least squares algorithm to find factor analysis 
solutions. We minimize the sum of squares 

a{X, A) = SSQ (B - A - XX') 

over loadings X and the diagonal matrix of uniquenesses A. The algorithm starts with A^ 
and alternates 


C {k) = T P {B- A (fc) ), 
A (fc+1) = diag(5-C (fc) ). 


By substituting the first step into the second step we can think of the algorithm as a map r 
that updates uniquenesses in a vector to a vector A straightforward application 

of theorem 1 shows 


VjTiiS) = Y l k i k l 


2 A ) , , k%s k it k js k jt 

t= 1 A t - V 
ty.s 


using the eigenvalues and eigenvectors of B — A. 


As an example we use the Harman.Burt data from the psych package ((???)). 


data(Harman, package = "psych") 
h3 <- thomson (Harman.Burt, p = 3, 

h2 <- thomson (Harman.Burt, p = 2, 

hi <- thomson (Harman.Burt, p = 1, 


verbose = FALSE, 
verbose = FALSE, 
verbose = FALSE, 


eps 

eps 

eps 


le-15, itmax 
le-15, itmax 
le-15, itmax 


1000 ) 

1000 ) 

1000 ) 


In one dimension we need 21 iterations to get to loss function value 0.5514339263, with an 
estimated convergence rate of 0.4818706151. In two dimensions 107 iterations get us to loss 
0.1486134802 with estimated rate 0.8746643597. And in three dimensions 435 iterations get 
us to 0.0312374332 and rate 0.970287468. The estimated convergence rates correspond closely 
to the largest eigenvalues of the Jacobian, computed by the function tderivative. 
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mprint (eigen (tderivative (Harman.Burt, hl|u, p=l) )$values) 

## [1] 0.4819142774 0.3684002138 0.2782070726 0.2446414810 

## [5] 0.1830500694 0.1244703151 0.1022921260 0.0597542139 

mprint (eigen (tderivative (Harman.Burt, h2|u, p=2) ) lvalues) 

## [1] 0.8746643733 0.6956043771 0.4674777799 0.4166336265 

## [5] 0.3547709022 0.2311944490 0.1756869942 0.0967558960 

mprint (eigen (tderivative (Harman.Burt, h3|u, p=3) ) lvalues) 

## [1] 0.9702874840 0.8848642535 0.8207644608 0.6522238451 

## [5] 0.4197176966 0.3589311780 0.2335541555 0.1622049624 

4 Appendix: Code 

4.1 auxilary.R 

ei <- function (i, n) { 

return (ifelse(i == (l:n), 1, 0)) 

> 

aij <- function (i, j, n) { 
u <- ei(i, n) - ei (j , n) 
return (outer (u, u)) 

> 

kdelta <- function (i, j) { 
ifelse (i == j, 1 , 0) 

> 

mprint <- function (x, d = 10, w = 15) { 
print (noquote (formate ( 
x, di = d, wi = w, fo = "f" 

))) 

> 

mnorm <- function (x, norm = 2) { 

return (sum (abs (x) ~ norm) (1 / norm)) 

> 

basis <- function (i, j, n) { 
s <- sqrt (2) / 2 
a <- matrix (0, n, n) 
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if (i == j) 
a[i, i] <- 1 
else { 

a[i, j] <- a[j , i] <- s 

} 

return (a) 

> 

doubleCenter <- function (x) { 
n <- nrow (x) 
j <- diag(n) - (1 / n) 
return (j 1*1 x 1*1 j) 

} 

center <- function (x) { 

return (apply (x, 2, function (z) z - mean (z))) 

> 

squareDist <- function (x) { 
d <- diag (x) 

return (outer (d, d, "+") - 2 * x) 

} 

lrank <- function (x, p = 2) { 
e <- eigen (x) 
v <- e$vectors[, 1:p] 
return (tcrossprod (x 1*1 v, v)) 

} 

mpower <- function (x, p, eps = le-16) { 
z <- eigen (x, symmetric = TRUE) 
zval <- z$values[1: (nrow(x) - 1)] p 
zvec <- z$vectors[, l:(nrow(x) - 1)] 
return (zvec 1*1 diag (zval) 1*1 t (zvec)) 


4.2 thomson.R 

thomson <- 
function (b, 

uold = rep (0, nrow (b)), 

p = 2, 

itmax = 1000, 
eps = le-10, 
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verbose = TRUE) { 
bold <- b - diag (uold) 

cold <- lrank (bold, p) 

fold <- sum ((bold - cold) ~ 2) 
eold <- Inf 

itel <- 1 

repeat { 

unew <- diag (b - cold) 
enew <- mnorm (uold - unew) 
rnew <- enew / eold 
bnew <- b - diag (unew) 
enew <- lrank (bnew, p) 
fnew <- sum ((bnew - enew) ~ 2) 
if (verbose) { 
cat ( 

formate (itel, width = 4, format = "d"), 
formate ( 
fold, 

digits = 10, 
width = 15, 
format = "f" 

), 

formate ( 
fnew, 

digits = 10, 
width = 15, 
format = "f" 

), 

formate ( 
enew, 

digits = 10, 
width = 15, 
format = "f" 

), 

formate ( 
rnew, 

digits = 10, 
width = 15, 
format = "f" 

), 

"\n" 

) 

} 

if ((itel == itmax) |I (fold - fnew) < eps) 
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break 

itel <- itel + 1 
uold <- unew 
fold <- fnew 
cold <- cnew 
bold <- bnew 
eold <- enew 

} 

return (list ( 
u = unew, 
f = fnew, 
e = enew, 
r = rnew, 
itel = itel 

)) 

} 

tmap <- function (b, u, p = 2) { 
c <- lrank (b - diag (u)) 
return (diag (b - c)) 

} 

tderivative <- function (b, u, p = 2) { 
n <- length (u) 
e <- eigen (b - diag (u)) 
k <- e$vectors 
1 <- e$values 
m <- matrix (0, n, n) 
for (i in l:n) 

for (j in l:n) 
for (s in 1: p) { 

m[i,j] <- m[i,j] + (k[i,s] 2) * (k[j, s] * 2) 

for (t in l:n) { 
if (s == t) 
next 

xi <- l[s] / (1 [t] - l[s]) 
m[i, j] <- 

m[i, j] - 2 * xi * k[i, s] * k[i, t] * k[j, s] * k[j, t] 

} 

} 

return (m) 

} 
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